Modeling pronunciation variation using artificial neural networks for English spontaneous speech
نویسندگان
چکیده
Pronunciation variation in conversational speech has caused significant amount of word errors in large vocabulary automatic speech recognition. Rule-based approaches and decision-tree based approaches have been previously proposed to model pronunciation variation. In this paper, we report our work on modeling pronunciation variation using artificial neural networks (ANN). The results we achieved are significantly better than previously published ones on two different corpora, indicating that ANN may be better suited for modeling pronunciation variation than other statistical models that have been previously investigated. Our experiments indicate that binary distinctive features can be used to effectively represent the phonological context. We also find that including pitch accent feature in input improves the prediction of pronunciation variation on a ToBI-labeled subset of the Switchboard corpus.
منابع مشابه
Pronunciation Modeling In Speech Synthesis
This dissertation investigates the area of pronunciation modeling in speech synthesis. By pronunciation modeling, we mean architectures and principles for generating high-quality human-like pronunciations. The term pronunciation modeling has previously been applied in the context of speech recognition (e.g. Byrne et al. 1997). In that context, it describes theories and procedures for handling t...
متن کاملNeural network based pronunciation modeling with applications to speech recognition
We propose a method for automatically generating a pronunciation dictionary based on a pronunciation neural network that can predict plausible pronunciations (realized pronunciations) from canonical pronunciations. This method can generate multiple forms of realized pronunciations using the pronunciation network. Experimental results on spontaneous speech show that the automatically-derived pro...
متن کاملFeature-based Pronunciation Modeling for Speech Recognition
We present an approach to pronunciation modeling in which the evolution of multiple linguistic feature streams is explicitly represented. This differs from phone-based models in that pronunciation variation is viewed as the result of feature asynchrony and changes in feature values, rather than phone substitutions, insertions, and deletions. We have implemented a flexible feature-based pronunci...
متن کاملAutomatic generation of pronunciation lexicons for Mandarin spontaneous speech
Pronunciation modeling for large vocabulary speech recognition attempts to improve recognition accuracy by identifying and modeling pronunciations that are not in the ASR systems pronunciation lexicon. Pronunciation variability in spontaneous Mandarin is studied using the newly created CASS corpus of phonetically annotated spontaneous speech. Pronunciation modeling techniques developed for Engl...
متن کاملAutomatic generation of multiple pronunciations based on neural networks
We propose a method for automatically generating a pronunciation dictionary based on a pronunciation neural network that can predict plausible pronunciations (alternative pronunciations) from the canonical pronunciation. This method can generate multiple forms of alternative pronunciations using the pronunciation network. For generating a sophisticated alternative pronunciation dictionary, two ...
متن کامل